Day 16 - Regular expressions - Multiple matches

88

Exercise 16.03

The log file simple.log contains the IP address of the client for each request. IP addresses are made of

four numbers separated by dots (i.e. A.B.C.D), where each number goes from 0 to 255 (thus having

from 1 to 3 digits). Find the 5 IP addresses that occur the highest number of times in the file, counting

them

Solution

$ grep -Eo "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" simple.log | sort | uniq\

-c | sort -nr | head -n 5

482 66.249.73.135

364 46.105.14.53

357 130.237.218.86

273 75.97.9.59

113 50.16.19.13

The regular expressions is made of four repetitions of [0-9]{1,3}, which matches 1 to 3 adjacent

digits, separated by \. which is a literal dot (remember that a dot without the escape backslash

matches any character). The following sort and uniq -c provide the counting, the last sort -nr

orders the list again using the numerical sort (which orders according to the count, as this is at the

beginning of each line), and in reverse order, starting from bigger numbers down to 1. The last head

-n5 at last selects the top 5 from the list.

Go back to the exercise

Exercise 16.04

The file simple.log contains the HTTP method used in the request (for example GET or POST) followed

by a space and the rest of the log line. For each request that uses a GET print the HTTP method and

everything that follows.

Solution